Synthetic Data for Social Good

نویسندگان

  • Bill Howe
  • Julia Stoyanovich
  • Haoyue Ping
  • Bernease Herman
  • Matt Gee
چکیده

Data scientists need access to data to do social good. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data sharing agreements can help prevent privacy violations, but require a level of specificity that is premature during preliminary discussions, and can take over a year to establish. We consider the generation and use of synthetic data to facilitate ad hoc collaborations involving sensitive data. A good synthetic dataset has two properties: it is representative of the original data, and it provides strong guarantees about privacy. In this paper, we 1) discuss use cases for synthetic data that challenge the state of the art in privacy-preserving data generation, and 2) describe DataSynthesizer, a dataset generation tool that takes a sensitive dataset as input and generates a structurally and statistically similar synthetic dataset, with strong privacy guarantees, as output. The data owners need not release their data, while potential collaborators can begin developing models and methods with some confidence that their results will work similarly on the real dataset. The distinguishing feature of DataSynthesizer is its usability — in most cases, the data owner need not specify any parameters to start generating and sharing data safely and effectively. The code implementing DataSynthesizer is publicly available on GitHub at https://github.com/DataResponsibly. The work on DataSynthesizer is part of the Data, Responsibly project, where the goal is to operationalize responsibility in data sharing, integration, analysis and use. ∗This work was supported by the University of Washington Information School, Microsoft, the Gordon and Betty Moore Foundation (Award #2013-10-29) and the Alfred P. Sloan Foundation (Award #3835) through the Data Science Environments program. †This work was supported in part by NSF Grants No. 1741047, 1464327 and 1539856, and BSF Grant No. 2014391. Bloomberg Data for Good Exchange Conference. 24-Sep-2017, New York City, NY, USA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Why You Should Charge Your Friends for Borrowing Your Stuff

We consider goods that can be shared with k-hop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the soc...

متن کامل

Robots as Social Actors: Aurora and the Case of Autism

This paper discusses the role of predictability and control in robot-human interaction. This involves the central question whether humans are good models for synthetic (social) agents. Design issues based on cognitive accounts towards robot-human interaction are discussed with respect to the author’s recent work on building interactive robotic systems as remedial tools (teaching devices) for ch...

متن کامل

An Effective Method for Utility Preserving Social Network Graph Anonymization Based on Mathematical Modeling

In recent years, privacy concerns about social network graph data publishing has increased due to the widespread use of such data for research purposes. This paper addresses the problem of identity disclosure risk of a node assuming that the adversary identifies one of its immediate neighbors in the published data. The related anonymity level of a graph is formulated and a mathematical model is...

متن کامل

بررسی و آزمون تطابق هیدروگراف های واحد مصنوعی و طبیعی در حوضهْ آبخیز سد زاینده رود "زیر حوضهْ پلاسجان"

As unit hydrograph is an important item in flood estimation of the rivers and since flood hydrograph and simultaneous rainfall hyetograph is needed to derive a unit hydrograph, hydrologists recommend synthetic unit hydrographs for areas lacking these hydrometeorological data. A research was conducted in the Zayandehrud-dam watershed (Pelasjan sub-basin) to test the efficiency of synthetic unit ...

متن کامل

بررسی و آزمون تطابق هیدروگراف های واحد مصنوعی و طبیعی در حوضهْ آبخیز سد زاینده رود "زیر حوضهْ پلاسجان"

As unit hydrograph is an important item in flood estimation of the rivers and since flood hydrograph and simultaneous rainfall hyetograph is needed to derive a unit hydrograph, hydrologists recommend synthetic unit hydrographs for areas lacking these hydrometeorological data. A research was conducted in the Zayandehrud-dam watershed (Pelasjan sub-basin) to test the efficiency of synthetic unit ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1710.08874  شماره 

صفحات  -

تاریخ انتشار 2017